{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cbba56a9",
   "metadata": {},
   "source": [
    "# Vector Similarity for RediSearch\n",
    "\n",
    "This file contains the different capabilities RediSearch offers in the private preview build.\n",
    "RediSearch vector similiarity capabilities are:\n",
    "1. Realtime vector indexing\n",
    "2. Realtime vector update\n",
    "3. Realtime vector deletion\n",
    "4. TOP-K query\n",
    "\n",
    "## Indexing capabilities\n",
    "In private preview build there are two types of indexing methods supported and three types of distance metrics:\n",
    "\n",
    "### Index algorithms\n",
    "1. Brute force (Flat Index)\n",
    "2. HNSW (Hierarchical Navigable Small World)\n",
    "\n",
    "### Distance metrics\n",
    "1. L2 - Euclidean distance between two vectors\n",
    "2. IP - Internal product of two vectors\n",
    "3. COSINE - Cosine similarity of two vectors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac6e5a20",
   "metadata": {},
   "source": [
    "## Creating an index\n",
    "In order to create a vector index, the index creation command `FT.CREATE` should be invoked over the vector field name with the new reserved word `VECTOR`\n",
    "\n",
    "Command format:\n",
    "```\n",
    "FT.CREATE <index_name> SCHEMA <vector field name> VECTOR <type> <dimension> <distance metric> <index algorithm> <algorithm parameters>\n",
    "```\n",
    "### General indexing mandatory parameters\n",
    "Parameters should be given to the index build command in the following order\n",
    "* type - vector data type -  - Currently only `FLOAT32` is supported\n",
    "* dimension - vector dimension.\n",
    "* distance metric - either `L2` for euclidean distance, `IP` for internal product or `COSINE` for cosine similarity should be provided. Note, when `COSINE` is selected the indexed vectors will be normalized upon indexing, and the query vector will be normalized upon query.\n",
    "* Indexing algorithm - either `BF` for brute force or `HNSW` for HNSW indexing algorithm\n",
    "\n",
    "### Brute force (Flat index)\n",
    "This index compares the entire indexed vector data to the query vector and returns the top-k similar vectors, according to the given distance metric.\n",
    "\n",
    "#### Index specific parameters\n",
    "* `INITIAL_CAP` - initial index capacity (number of vectors). This will make the index pre-allocate space for the intended vector, so no additional allocations will happen while indexing (Optimization)\n",
    "\n",
    "An example for creating a brute force index, with initial capacity of 1 million vectors of 128 float, using L2 distance metric\n",
    "\n",
    "```\n",
    "FT.CREATE my_flat_index SCHEMA my_vector_field VECTOR FLOAT32 128 L2 BF INITIAL_CAP 1000000\n",
    "```\n",
    "\n",
    "### HNSW\n",
    "This index algorithm is a modified version of [nmslib/hnswlib](https://github.com/nmslib/hnswlib) which is the author's implementation of [Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs](https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf)\n",
    "\n",
    "#### Index specific parameters\n",
    "* `INITIAL_CAP` - initial index capacity (number of vectors). This will make the index pre-allocate space for the intended vector, so no additional allocations will happen while indexing (Optimization)\n",
    "* `M` - maximum number of outbound connections in the graph.\n",
    "* `EF` -  Maximum number of potential candidates to connect while building the graph.\n",
    "\n",
    "An example for creating HNSW index, with initial capacity of 1 million vectors of 128 float, using L2 distance metric, `ef` is 200 and `M` is 40\n",
    "\n",
    "```\n",
    "FT.CREATE my_hnsw_index SCHEMA my_vector_field VECTOR FLOAT32 128 L2 HNSW INITIAL_CAP 1000000 M 40 EF 200\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64ea03db",
   "metadata": {},
   "source": [
    "## Query\n",
    "In order to execute top-k vector query the search command `FT.SEARCH` should be invoked with the vector blob as a parameter\n",
    "\n",
    "Command format:\n",
    "```\n",
    "FT.SEARCH <index name> \"@<vector field name>:[$<vector blob parameter name> TOPK <k>]\" RETURN 1 <vector field name>_score SORTBY <vector field name>_score LIMIT 0 <k>  PARAMS 2 <vector blob parameter name> <vector blob>\n",
    "```\n",
    "\n",
    "### Query tuning parameters\n",
    "#### HNSW\n",
    "* `EFRUNTIME` - Maximum number of potential top-k candidates to collect while querying the graph. `EFRUNTIME` should be greater or equal to `K`\n",
    "\n",
    "An example for top-10 query over HNSW indexed dataset with `EFRUNTIME` equals 150\n",
    "\n",
    "```\n",
    "FT.SEARCH my_hnsw_index \"@my_vector_field:[$vec TOPK 10] => {$EFRUNTIME:150}\" RETURN 1 my_vector_field_score SORTBY my_vector_field_score LIMIT 0 10 PARAMS 2 vec <vector blob>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "323aec7f",
   "metadata": {},
   "source": [
    "## Python examples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19bdc2a5-2192-4f5f-bd6e-7c956fd0e230",
   "metadata": {},
   "source": [
    "### Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "4dbaf749",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting git+https://github.com/RediSearch/redisearch-py.git@params\n",
      "  Cloning https://github.com/RediSearch/redisearch-py.git (to revision params) to /tmp/pip-req-build-a3f7k_u3\n",
      "  Running command git clone -q https://github.com/RediSearch/redisearch-py.git /tmp/pip-req-build-a3f7k_u3\n",
      "  Running command git checkout -b params --track origin/params\n",
      "  Switched to a new branch 'params'\n",
      "  Branch 'params' set up to track remote branch 'params' from 'origin'.\n",
      "  Installing build dependencies ... \u001b[?25ldone\n",
      "\u001b[?25h  Getting requirements to build wheel ... \u001b[?25ldone\n",
      "\u001b[?25h    Preparing wheel metadata ... \u001b[?25ldone\n",
      "\u001b[?25hRequirement already satisfied (use --upgrade to upgrade): redisearch==2.0.0 from git+https://github.com/RediSearch/redisearch-py.git@params in /home/dvirdukhan/Code/RediSearch/venv/lib/python3.8/site-packages\n",
      "Requirement already satisfied: redis<4.0.0,>=3.5.3 in /home/dvirdukhan/Code/RediSearch/venv/lib/python3.8/site-packages (from redisearch==2.0.0) (3.5.3)\n",
      "Requirement already satisfied: rmtest@ git+https://github.com/RedisLabs/rmtest@master from git+https://github.com/RedisLabs/rmtest@master in /home/dvirdukhan/Code/RediSearch/venv/lib/python3.8/site-packages (from redisearch==2.0.0) (0.7.0)\n",
      "Collecting rejson<0.6.0,>=0.5.4\n",
      "  Downloading rejson-0.5.4.tar.gz (8.4 kB)\n",
      "Requirement already satisfied: six<2.0.0,>=1.16.0 in /home/dvirdukhan/Code/RediSearch/venv/lib/python3.8/site-packages (from redisearch==2.0.0) (1.16.0)\n",
      "Requirement already satisfied: hiredis<3.0.0,>=2.0.0; python_version >= \"3.6\" and python_version < \"4.0\" in /home/dvirdukhan/Code/RediSearch/venv/lib/python3.8/site-packages (from redisearch==2.0.0) (2.0.0)\n",
      "Building wheels for collected packages: redisearch, rejson\n",
      "  Building wheel for redisearch (PEP 517) ... \u001b[?25ldone\n",
      "\u001b[?25h  Created wheel for redisearch: filename=redisearch-2.0.0-py2.py3-none-any.whl size=26773 sha256=df265a22a9f26fd63895ad2898e0a5bb8776a74ce68b7ea5841e53f00e475e9d\n",
      "  Stored in directory: /tmp/pip-ephem-wheel-cache-7pn8k1g1/wheels/b4/7c/ba/01a2c3649eba62bf60cf6e2c61f86b18922cb53b17b1c8366b\n",
      "  Building wheel for rejson (setup.py) ... \u001b[?25lerror\n",
      "\u001b[31m  ERROR: Command errored out with exit status 1:\n",
      "   command: /home/dvirdukhan/Code/RediSearch/venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '\"'\"'/tmp/pip-install-8vvi4o31/rejson/setup.py'\"'\"'; __file__='\"'\"'/tmp/pip-install-8vvi4o31/rejson/setup.py'\"'\"';f=getattr(tokenize, '\"'\"'open'\"'\"', open)(__file__);code=f.read().replace('\"'\"'\\r\\n'\"'\"', '\"'\"'\\n'\"'\"');f.close();exec(compile(code, __file__, '\"'\"'exec'\"'\"'))' bdist_wheel -d /tmp/pip-wheel-8vfzidce\n",
      "       cwd: /tmp/pip-install-8vvi4o31/rejson/\n",
      "  Complete output (6 lines):\n",
      "  usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]\n",
      "     or: setup.py --help [cmd1 cmd2 ...]\n",
      "     or: setup.py --help-commands\n",
      "     or: setup.py cmd --help\n",
      "  \n",
      "  error: invalid command 'bdist_wheel'\n",
      "  ----------------------------------------\u001b[0m\n",
      "\u001b[31m  ERROR: Failed building wheel for rejson\u001b[0m\n",
      "\u001b[?25h  Running setup.py clean for rejson\n",
      "Successfully built redisearch\n",
      "Failed to build rejson\n",
      "Installing collected packages: rejson\n",
      "    Running setup.py install for rejson ... \u001b[?25ldone\n",
      "\u001b[?25hSuccessfully installed rejson-0.5.4\n",
      "Requirement already satisfied: numpy in /home/dvirdukhan/Code/RediSearch/venv/lib/python3.8/site-packages (1.21.2)\n"
     ]
    }
   ],
   "source": [
    "!pip install git+https://github.com/RediSearch/redisearch-py.git@params\n",
    "!pip install numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "09a8f030",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from redis import Redis\n",
    "import redisearch"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8c6ef53",
   "metadata": {},
   "source": [
    "Create redis client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "370c1fcc",
   "metadata": {},
   "outputs": [],
   "source": [
    "host = \"localhost\"\n",
    "port = 6379\n",
    "\n",
    "redis_conn = Redis(host = host, port = port)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "06c421de-00ee-42c5-8487-b46acd02950a",
   "metadata": {},
   "outputs": [],
   "source": [
    "n_vec = 10000\n",
    "dim = 128\n",
    "M = 40\n",
    "EF = 200\n",
    "vector_field_name = \"vector\"\n",
    "k = 10\n",
    "hnsw_EFRUNTIME = 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "da997470-4e8d-4d94-9c90-5aa009415699",
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_vectors(client : Redis, n, d,  field_name):\n",
    "    for i in range(n):\n",
    "        np_vector = np.random.rand(1, d).astype(np.float32)\n",
    "        client.hset(i, field_name, np_vector.tobytes())\n",
    "        \n",
    "def delete_data(client: Redis):\n",
    "    client.flushall()\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1da8d1ed",
   "metadata": {},
   "source": [
    "### Brute Force"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b915282d-eeb9-44e8-a180-14b1c3ddba14",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[6278, 8574, 5487, 1069, 4578, 3150, 2870, 9628, 5718, 6263]\n",
      "[14.2357501984, 14.7951850891, 15.0644397736, 15.0954351425, 15.1015148163, 15.1202888489, 15.1720476151, 15.2158946991, 15.23898983, 15.4573745728]\n"
     ]
    }
   ],
   "source": [
    "# build index\n",
    "bf_index = redisearch.Client(\"my_flat_index\", conn=redis_conn)\n",
    "bf_index.redis.execute_command(\"FT.CREATE\", \"my_flat_index\", \"SCHEMA\", vector_field_name, \"VECTOR\", \"FLOAT32\", dim, \"L2\", \"BF\", \"INITIAL_CAP\", n_vec)\n",
    "#load vectors\n",
    "load_vectors(bf_index.redis, n_vec, dim, vector_field_name)\n",
    "#query\n",
    "query_vector =  np.random.rand(1, dim).astype(np.float32)\n",
    "q = redisearch.Query(f'@{vector_field_name}:[$vec_param TOPK {k}]').sort_by(f'{vector_field_name}_score').paging(0,k).return_field(f'{vector_field_name}_score')\n",
    "res = bf_index.search(q, query_params = {'vec_param': query_vector.tobytes()})\n",
    "docs = [int(doc.id) for doc in res.docs]\n",
    "rs_dists = [float(doc.vector_score) for doc in res.docs]\n",
    "print(docs)\n",
    "print(rs_dists)\n",
    "#cleanup\n",
    "delete_data(bf_index.redis)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "223d4a22-41bd-44cb-9c6f-02c16c07d5f2",
   "metadata": {},
   "source": [
    "### HNSW"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "3266495a-d2e1-450a-9590-959b368f013c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[9477, 7006, 7913, 9990, 6810, 6923, 1047, 6816, 3644, 164]\n",
      "[13.624666214, 13.6373023987, 14.4420623779, 14.5344333649, 14.5973854065, 14.7105751038, 14.8035182953, 14.9341831207, 15.0296001434, 15.0314159393]\n"
     ]
    }
   ],
   "source": [
    "# build index\n",
    "hnsw_index = redisearch.Client(\"my_hnsw_index\", conn=redis_conn)\n",
    "hnsw_index.redis.execute_command(\"FT.CREATE\", \"my_hnsw_index\", \"SCHEMA\", vector_field_name, \"VECTOR\", \"FLOAT32\", dim, \"L2\", \"HNSW\", \"INITIAL_CAP\", n_vec, \"M\", M, \"EF\", EF)\n",
    "#load vectors\n",
    "load_vectors(hnsw_index.redis, n_vec, dim, vector_field_name)\n",
    "#query\n",
    "query_vector =  np.random.rand(1, dim).astype(np.float32)\n",
    "q = redisearch.Query(f'@{vector_field_name}:[$vec_param TOPK {k}]  => {{$EFRUNTIME : {hnsw_EFRUNTIME}}}').sort_by(f'{vector_field_name}_score').paging(0,k).return_field(f'{vector_field_name}_score')\n",
    "res = hnsw_index.search(q, query_params = {'vec_param': query_vector.tobytes()})\n",
    "docs = [int(doc.id) for doc in res.docs]\n",
    "rs_dists = [float(doc.vector_score) for doc in res.docs]\n",
    "print(docs)\n",
    "print(rs_dists)\n",
    "#cleanup\n",
    "delete_data(hnsw_index.redis)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
